Skip to content

feat(evolve): Phase 2 of /evolve --mode=loop — ladder + cron self-adjust + typed blocked events (soc-g2qd #phase-2)#397

Merged
boshu2 merged 1 commit into
mainfrom
feat/evolve-loop-phase2-soc-g2qd
May 21, 2026
Merged

feat(evolve): Phase 2 of /evolve --mode=loop — ladder + cron self-adjust + typed blocked events (soc-g2qd #phase-2)#397
boshu2 merged 1 commit into
mainfrom
feat/evolve-loop-phase2-soc-g2qd

Conversation

@boshu2
Copy link
Copy Markdown
Owner

@boshu2 boshu2 commented May 21, 2026

Why

Phase 2 of soc-g2qd: ship the CLI enforcement primitives the skill's prompt-text alone can't guarantee.

Bead What it gives the operator-loop
soc-mlbm ao evolve next-work — 5-step programmatic ladder; agent stops guessing what to claim next
soc-un0m ao cron self-adjust — renders cron template + emits JSON spec; replaces manual CronList/Delete/Create per cycle
soc-g34d ao evolve blocked — typed blocked events at .agents/evolve/blocked.jsonl; agent logs rather than halts

What changed

Surface Change
cli/cmd/ao/evolve_next_work.go + _test.go New subcommand + L2 integration tests
cli/internal/evolve/ladder/ 5-step ladder package (shape_filter, grep_siblings, primitive_test, cross_hop_pickup, bug_fallback) with table-driven unit tests
cli/cmd/ao/cron.go New top-level ao cron command
cli/cmd/ao/cron_self_adjust.go + _test.go New subcommand; calls evolve.VerifyMarkers + evolve.Render from #394; writes audit row to .agents/evolve/cron-history.jsonl; emits JSON spec to stdout (harness orchestrates CronCreate)
cli/cmd/ao/evolve_blocked.go + _test.go New subcommand: --reason (write), --list [--tail N] [--json] (read), --clear <cycle> (operator)
Generated: cli/docs/COMMANDS.md, registry.json, docs/cli-skills-map.md Regen for 3 new subcommands
evals/agentops-core/cli-command-surface-matrix.json + smoke fixture Counts bumped 73/199/272 → 74/202/276

How tested

  • L2 integration: each new subcommand has L2 tests using fixture workspaces
  • L1 unit: 5-step ladder per-step table tests + JSONL schema validation on blocked records
  • Mechanical: go test ./cli/... 0 → 0 failures; cli-command-surface-smoke.sh cli-help-matrix-ok; check-no-tracked-agents.sh exits 0

Counts

CLI heading counts: top 73 → 74, sub 199 → 202, all 272 → 276.

Sibling pattern: cron-history.jsonl + blocked.jsonl follow the cycle-history.jsonl JSONL append-only shape from soc-5qit. Ladder structure mirrors the in-prompt cascade in references/scout-mode.md — making it programmatic per §A5.

[no-sibling for cron-self-adjust] First-of-kind: no prior subcommand emits a cron-spec JSON for harness orchestration. The CLI does the safe work (template render + marker verify + audit row); the harness owns CronCreate.

See: docs/plans/2026-05-21-evolve-loop-epic-design.md §A4, §A5, §A6

Closes-scenario: soc-mlbm#next-work-ladder
Closes-scenario: soc-un0m#cron-self-adjust
Closes-scenario: soc-g34d#typed-blocked-events
Bounded-context: BC5-Runtime
Evidence: cli/cmd/ao/evolve_next_work.go

…ust + typed blocked events (soc-g2qd #phase-2)

Closes 3 Phase-2 sub-beads of soc-g2qd in one PR (same surface cli/cmd/ao/, inter-dependent):

| Bead | Surface |
|---|---|
| soc-mlbm | `ao evolve next-work` — 5-step programmatic ladder (cli/internal/evolve/ladder) |
| soc-un0m | `ao cron self-adjust` — render cron template via evolve.Render; emit JSON spec to stdout (harness orchestrates CronCreate) |
| soc-g34d | `ao evolve blocked` — typed blocked-event log at .agents/evolve/blocked.jsonl (--reason write / --list read / --clear) |

## What changed

| Surface | Change |
|---|---|
| `cli/cmd/ao/evolve_next_work.go` + `_test.go` | New subcommand + L2 integration tests |
| `cli/internal/evolve/ladder/ladder.go` + `_test.go` | 5-step ladder package (shape_filter, grep_siblings, primitive_test, cross_hop_pickup, bug_fallback) with table-driven unit tests |
| `cli/cmd/ao/cron.go` | New top-level `ao cron` command (parent for self-adjust) |
| `cli/cmd/ao/cron_self_adjust.go` + `_test.go` | New subcommand; calls evolve.VerifyMarkers + evolve.Render from #394; writes audit row to .agents/evolve/cron-history.jsonl; emits JSON spec to stdout |
| `cli/cmd/ao/evolve_blocked.go` + `_test.go` | New subcommand: --reason (write), --list [--tail N] [--json] (read), --clear <cycle> (operator-only) |
| Generated: COMMANDS.md, registry.json, cli-skills-map.md | Regen for 3 new subcommands |
| evals/agentops-core canary counts | Bumped: top 73→74, sub 199→202, all 272→276 |

## How tested

- L2 integration: each subcommand has L2 tests using fixture workspaces and asserting structural equality on outputs
- L1 unit: ladder per-step table tests (5 steps × multiple cases each); JSONL schema validation on blocked-event records
- Mechanical: `go test ./cli/...` green; cli-command-surface-smoke.sh green; check-no-tracked-agents.sh green; TestCobraConformance green

Sibling pattern: cron-history.jsonl + blocked.jsonl follow the cycle-history.jsonl JSONL append-only shape from soc-5qit. Ladder is novel but its step structure mirrors the in-prompt cascade in `references/scout-mode.md` — making it programmatic per §A5.

Fitness: tests roughly +33 → ~33/33 new tests passing (5 ladder steps × ~3 cases each + 3 L2 subcommand + 3 L1 schema). go test ./cli/cmd/ao + ./cli/internal/evolve/ladder green.

[no-sibling for cron-self-adjust] First-of-kind: no prior subcommand emits a cron-spec JSON for harness orchestration. The pattern is intentionally minimal — the CLI does the safe work (template render + marker verify + audit row); the harness owns CronCreate.

See: docs/plans/2026-05-21-evolve-loop-epic-design.md §A4, §A5, §A6

Closes-scenario: soc-mlbm#next-work-ladder
Closes-scenario: soc-un0m#cron-self-adjust
Closes-scenario: soc-g34d#typed-blocked-events
Bounded-context: BC5-Runtime
Evidence: cli/cmd/ao/evolve_next_work.go
@boshu2 boshu2 force-pushed the feat/evolve-loop-phase2-soc-g2qd branch from 7613842 to 1c501ec Compare May 21, 2026 18:00
@boshu2 boshu2 merged commit 0759a74 into main May 21, 2026
71 checks passed
@boshu2 boshu2 deleted the feat/evolve-loop-phase2-soc-g2qd branch May 21, 2026 18:09
boshu2 added a commit that referenced this pull request May 22, 2026
…tors (soc-2gd6 #eval-hard-fails) (#402)

## Why

The v2.42.0 release gate (`scripts/ci-local-release.sh`) was red on 8
evals. The 3 score-0/near-0 hard fails are all **eval-staleness behind
legitimate recent refactors** — verified, not gaming or security
weakening. Operator decision: update eval to match source of truth
(executable > contract).

| Eval | Was | Cause | Fix |
|---|---|---|---|
| `hook-manifest-command-counts` | 0 | `session-pr-counter.sh` (PR #362)
is the legit 37th hook script; eval hardcoded 43/36 | bump expected
counts 43→44, 36→37 |
| `push-worktree landing-plane` | 0.14 | #387 tiered-AGENTS split moved
"Landing the Plane" to `AGENTS-WORKFLOW.md` (+ dropped 2 lines) |
redirect eval target `AGENTS.md`→`AGENTS-WORKFLOW.md` + restore the 2
dropped policy lines |
| `security-toolchain ci-soft-gate-policy` | 0 | gate is intentionally
**HARD** (no `continue-on-error`); job already runs `security-gate.sh
--mode quick` + uploads artifacts | drop the stale `continue-on-error`
requirement (security stays HARD) |

**Security note:** `security-toolchain-gate` stays a HARD blocking gate.
Only the stale "soft gate" assertion was removed from the eval; the
actual scan + artifact upload + summary-blocking are unchanged.

## How tested
- hook-manifest jq → `hook-manifest-counts-ok`
- security smoke `ci-policy` → `security-toolchain-ci-policy-ok`
- all 7 landing-plane strings present in `AGENTS-WORKFLOW.md`
- shellcheck clean on edited smoke

## Scope honesty
This fixes the 3 **hard** fails only. The release gate still has **5
minor evals (0.71–0.99)** + the **vil/release-smoke** lane — a separate
remediation, deliberately NOT in this PR (no green-washing).

Sibling pattern: same "update eval to match legitimately-changed source
of truth" move as the cli-command-surface canary bumps in #396/#397.

Fitness: release-gate eval hard-fails 3 → 0.

Closes-scenario: soc-2gd6#eval-hard-fails
Bounded-context: BC4-Validation
Evidence:
evals/agentops-core/fixtures/security-toolchain-governance-smoke.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant